Powered by Rmarkdown.
NCT_ID→(JensenLab:Tagger)→DOID
NCT_ID→(AACT)→MeSH
NCT_ID→(NextMove:LeadMine)→SMILES
SMILES→(PubChem)→CID
CID→(PubChem)→INCHIKEY
INCHIKEY→(ChEMBL)→MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID→(ChEMBL)→ACTIVITY_ID
ACTIVITY_ID→(ChEMBL)→TARGET_CHEMBL_ID
TARGET_CHEMBL_ID→(ChEMBL)→COMPONENT_ID
COMPONENT_ID→(ChEMBL)→UNIPROT
ACTIVITY_ID→(ChEMBL)→DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID→(ChEMBL)→PUBMED_ID
aact_studies.tsvaact_drugs.tsvaact_descriptions.tsvaact_drugs_leadmine.tsvaact_drugs_smi_pubchem_cid.tsvaact_drugs_smi_pubchem_cid2ink.tsvaact_drugs_ink2chembl.tsvaact_drugs_chembl_activity.tsvaact_drugs_chembl_target_component.tsvaact_drugs_chembl_document.tsvpharos_targets.tsvaact_descriptions_tagger_matches.tsvdiseases_entities.tsv
nct_idis the study ID.
## [1] "Fri Apr 12 11:23:10 2019"
library(readr)
library(data.table)
library(stringr)
library(plotly, quietly=T)
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Reference type results_reference may offer greater evidence, confidence.
## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"
Read file of all drugs in AACT.
id is AACT INTERVENTION_ID, corresponding with an instance of a drug, dose, delivery, etc. in a study.## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only Interventional studies (study_type) associated with drugs (via NCT_ID).
## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
| phase | N_studies | N_drugs |
|---|---|---|
| Early Phase 1 | 1574 | 2615 |
| Phase 1 | 23603 | 48593 |
| Phase 1/Phase 2 | 6663 | 13288 |
| Phase 2 | 33910 | 68850 |
| Phase 2/Phase 3 | 3305 | 6503 |
| Phase 3 | 22988 | 49507 |
| Phase 4 | 19593 | 36331 |
| NA | 12785 | 29390 |
| overall_status | N_studies | N_drugs |
|---|---|---|
| Active, not recruiting | 6420 | 13962 |
| Completed | 72053 | 145006 |
| Enrolling by invitation | 638 | 1060 |
| Not yet recruiting | 4138 | 8001 |
| Recruiting | 16723 | 33973 |
| Suspended | 463 | 945 |
| Terminated | 10138 | 19618 |
| Unknown status | 10106 | 18463 |
| Withdrawn | 3742 | 6969 |
## Warning: Ignoring 1 observations
## Warning: Ignoring 1 observations
AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).
## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
| smi2img | N_mentions | names |
|---|---|---|
| 2637 | Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol | |
| 2545 | CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide | |
| 2461 | CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum | |
| 2070 | DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone | |
| 2054 | CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine | |
| 1779 | DOCETAXEL; Docetaxel; docetaxel | |
| 1625 | METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine | |
| 1540 | GEMCITABINE; Gemcitabine; gemcitabine | |
| 1342 | CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda | |
| 1178 | Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone | |
| 1157 | 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine | |
| 1157 | METHOTREXATE; Methotrexate; Metoject; methotrexate | |
| 1086 | BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine | |
| 1044 | ETOPOSIDE; Etoposid; Etoposide; etoposide | |
| 1027 | ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus | |
| 978 | NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline | |
| 977 | LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine | |
| 908 | CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar | |
| 903 | COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin | |
| 846 | Diprivan; PROPOFOL; Propofol; propofol |
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3933 / 4540 (86.6%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153342"
## [1] "PubChem CIDs with InChIKeys: 3783"
For Target Development Level (TDL) and other metadata.
Perhaps should instead use PubChem CIDs and UniChem.
## [1] "ChEMBL compounds mapped via InChIKeys: 3316"
Select only activities with pChembl values for relevance to protein targets and confidence.
## [1] "ChEMBL activities: 127943"
## [1] "ChEMBL activities molecules: 2302 ; canonical_smiles: 2302 ; targets: 3877 ; documents: 16959"
| assay_type | N_molecule | N_activity |
|---|---|---|
| F:Functional | 1828 | 73811 |
| B:Binding | 1831 | 49891 |
| A:ADMET | 759 | 4058 |
| P:Physicochemical | 44 | 120 |
| T:Toxicity | 28 | 59 |
| U:Unclassified | 3 | 4 |
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1805"
## [1] "Organisms: 187"
| organism | N_targets | Types |
|---|---|---|
| Homo sapiens | 1806 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; PROTEIN-PROTEIN INTERACTION; SELECTIVITY GROUP; SINGLE PROTEIN |
| Rattus norvegicus | 529 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SELECTIVITY GROUP; SINGLE PROTEIN |
| Mus musculus | 238 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Bos taurus | 98 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Sus scrofa | 36 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Cavia porcellus | 26 | SINGLE PROTEIN |
| Escherichia coli K-12 | 19 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Oryctolagus cuniculus | 18 | SINGLE PROTEIN |
| Escherichia coli | 17 | PROTEIN COMPLEX; SINGLE PROTEIN |
| Mycobacterium tuberculosis | 17 | SINGLE PROTEIN |
## [1] "Human targets: 1806"
| idgFamily | N |
|---|---|
| Kinase | 405 |
| Enzyme | 330 |
| GPCR | 158 |
| None | 120 |
| IC | 64 |
| Transporter | 53 |
| Epigenetic | 35 |
| NR | 28 |
| TF | 20 |
| TF; Epigenetic | 3 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 767" " Tclin: 342" " Tbio: 105"
## [4] " Tdark: 2"
With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.
serialno corresponds with DOID.id is AACT primary key.Likely false positives, manually removed:
| doid | N_mentions | terms |
|---|---|---|
| DOID:162 | 28596 | CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer |
| DOID:9351 | 17274 | DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus |
| DOID:6713 | 16632 | CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascul… |
| DOID:2030 | 12084 | ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state |
| DOID:1612 | 10583 | BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-canc… |
| DOID:2841 | 10021 | ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; … |
| DOID:3083 | 9782 | CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary dis… |
| DOID:9970 | 9303 | OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity |
| DOID:10763 | 9144 | HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-… |
| DOID:3393 | 6816 | C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart diseas… |
| DOID:0060145 | 6115 | ANALGESIA; Analgesia; analgeSia; analgesia |
| DOID:9352 | 5848 | Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type … |
| DOID:10283 | 5056 | Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; p… |
| DOID:8469 | 4985 | FLU; Flu; Influenza; flu; influenza |
| DOID:225 | 4962 | SYNDROME; Syndrome; syn drome; syndrome |
| DOID:3908 | 4959 | NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lun… |
| DOID:784 | 4841 | CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kid… |
| DOID:5419 | 4689 | SCHIZOPHRENIA; Schizophrenia; schizophrenia |
| DOID:684 | 3836 | HCC; HEPATOCELLULAR CARCINOMA; Hepatocellular Carcinoma; Hepatocellular carcinoma; Hepatoma; hcc; hepato-cellular carcinoma; hepatocellular Carcinoma; hepatocellular carcinoma; hepatoma |
| DOID:5844 | 3664 | Heart Attack; Heart attack; MYOCARDIAL INFARCTION; Myocardial Infarct; Myocardial Infarction; Myocardial infarct; Myocardial infarction; heart attack; myo-cardial infarction; myocardiaL infARction;… |
Sort synonyms terms by frequency.
| nct_id | doid | N_mentions | disease_terms |
|---|---|---|---|
| NCT00278330 | DOID:12603 | 1 | acute leukemia |
| NCT00278330 | DOID:8552 | 1 | chronic myelogenous leukemia |
| NCT00278330 | DOID:2355 | 1 | anemia |
| NCT00278330 | DOID:1240 | 1 | leukemia |
| NCT00456742 | DOID:2030 | 3 | Anxiety |
| NCT00456742 | DOID:14320 | 1 | generalized anxiety disorder |
| NCT01092039 | DOID:10459 | 2 | common cold |
| NCT01968083 | DOID:0080327 | 1 | March |
| NCT01968083 | DOID:2942 | 1 | bronchiolitis |
| NCT01968083 | DOID:552 | 1 | pneumonia |
| NCT02625168 | DOID:3908 | 1 | NSCLC |
| NCT02735317 | DOID:0080178 | 3 | mucositis |
| NCT02735317 | DOID:9663 | 2 | Oral Ulcer |
| NCT02735317 | DOID:9261 | 1 | Nasopharyngeal carcinoma |
| NCT03208634 | DOID:6713 | 8 | stroke;Stroke |
| NCT03208634 | DOID:0110066 | 1 | ERS |
| NCT03392246 | DOID:3908 | 3 | NSCLC |
| NCT03392246 | DOID:1324 | 1 | lung cancer |
| NCT03392246 | DOID:162 | 1 | cancer |
| NCT03575156 | DOID:9074 | 6 | SLE;Systemic lupus erythematosus |
| NCT03575156 | DOID:418 | 1 | systemic scleroderma |
| NCT03823092 | DOID:83 | 7 | cataract;Cataract |
| NCT03823092 | DOID:8947 | 2 | diabetic retinopathy |
| NCT03823092 | DOID:10871 | 1 | age related macular degeneration |
| NCT03823092 | DOID:8943 | 1 | LCD |
And include references.
Since each study may be associated with multiple drugs, targets and diseases, we build a table of all associated combinations, then aggregate by study (NCT_ID). For DOIDs with multiple terms, keep only most common term for simplicity.
## [1] "study-disease links: 237415"
NCT_ID→(NextMove:LeadMine)→SMILES
SMILES→(PubChem)→CID
Keep only studies including both disease and drug mentions.
## [1] "study-drug-disease links: 154971"
## [1] "studies with drug-disease links: 32832"
ACTIVITY_ID→(ChEMBL)→TARGET_CHEMBL_ID
TARGET_CHEMBL_ID→(ChEMBL)→COMPONENT_ID
COMPONENT_ID→(ChEMBL)→UNIPROT
## [1] "ACTIVITY_IDs: 127943 ; TARGET_CHEMBL_IDs: 3877 ; pairs: 127943"
## [1] "COMPONENT_IDs: 2535 ; TARGET_CHEMBL_IDs: 2481 ; pairs: 3157"
## [1] "UNIPROTs: 2535 ; SINGLE_PROTEIN UNIPROTs: 2183"
CID→(PubChem)→INCHIKEY
INCHIKEY→(ChEMBL)→MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID→(ChEMBL)→ACTIVITY_ID
## [1] "CIDs: 3783 ; INCHIKEYs: 3781 ; pairs: 3783"
## [1] "INCHIKEYs: 3314 ; MOLECULE_CHEMBL_IDs: 3314 ; pairs: 3316"
## [1] "MOLECULE_CHEMBL_IDs: 2302 ; TARGET_CHEMBL_IDs: 3877 ; ACTIVITY_IDs: 127943 ; DOCUMENT_CHEMBL_IDs: 16959"
## [1] "CID2UNIPROT links: 27008 ; CIDs: 2112 ; UNIPROTs: 2521"
## [1] "study-drug-disease-target links: 1725873"
## [1] "studies: 25486 ; drugs: 1560 ; diseases: 1814 ; targets: 2323"
| nct_id | drug_name | cid | disease_term | doid | gene_symbol | uniprot | idgTDL |
|---|---|---|---|---|---|---|---|
| NCT00206414 | Arimidex | 123631 | cancer | DOID:162 | PRKD3 | O94806 | Tchem |
| NCT00614809 | gefitinib | 123631 | unstable angina | DOID:8805 | EPHA1 | P21709 | Tchem |
| NCT00913484 | Disulfiram | 3117 | cocaine dependence | DOID:9975 | ALOX12 | P18054 | Tchem |
| NCT00053248 | arsenic trioxide | 261004 | chronic myelogenous leukemia | DOID:8552 | KMT2A | Q03164 | Tchem |
| NCT02474563 | Prednisone | 5865 | Multiple myeloma | DOID:9538 | KMT2A | Q03164 | Tchem |
| NCT00885443 | Propofol | 4943 | strabismus | DOID:540 | PTGS1 | P23219 | Tclin |
| NCT00465231 | bupivacaine | 3345 | analgesia | DOID:0060145 | MAOB | P27338 | Tclin |
| NCT03467178 | Carboplatin | 5352133 | ovarian cancer | DOID:2394 | OPRK1 | P41145 | Tclin |
| NCT00003545 | nelarabine | 3011155 | T-cell acute lymphoblastic leukemia | DOID:5603 | STK17A | Q9UEE5 | Tchem |
| NCT00080925 | doxorubicin hydrochloride | 443939 | acute lymphoblastic leukemia | DOID:9952 | SLCO1B1 | Q9Y6L6 | Tchem |
| NCT00186875 | Etoposide | 36462 | leukemia | DOID:1240 | SLC6A3 | Q01959 | Tclin |
| NCT02705352 | 5-Fluorouracil | 3385 | ectropion | DOID:1570 | MTOR | P42345 | Tclin |
ACTIVITY_ID→(ChEMBL)→DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID→(ChEMBL)→PUBMED_ID
## [1] "DOCUMENT_CHEMBL_IDs:: 16198 ; PMIDs: 15193"
Evidence weighted by: